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Abstract 

Boundary detection is essential for a variety of computer 
vision tasks such as segmentation and recognition. In this 
paper we propose a unified formulation and a novel algo- 
rithm that are applicable to the detection of different types 
of boundaries, such as intensity edges, occlusion bound- 
aries or object category specific boundaries. Our formu- 
lation leads to a simple method with state-of-the-art per- 
formance and significantly lower computational cost than 
existing methods. We evaluate our algorithm on different 
types of boundaries, from low-level boundaries extracted 
in natural images, to occlusion boundaries obtained us- 
ing motion cues and RGB-D cameras, to boundaries from 
soft-segmentation. We also propose a novel method for fig- 
ure/ground soft-segmentation that can be used in conjunc- 
tion with our boundary detection method and improve its 
accuracy at almost no extra computational cost. 

1. Introduction 

Boundary detection is a fundamental problem in com- 
puter vision and has been studied since the early days of the 
field. The majority of papers on boundary detection have fo- 
cused on using only low-level cues, such as pixel intensity 
or color [ \ 14, 16, 18, ' •']. Recent work has started explor- 
ing the problem of boundary detection using higher-level 
representations of the image, such as motion, surface and 
depth cues [ ','"'",'" ' ], segmentation [ i ], as well as category 
specific information [ , ] . 

In this paper we propose a general formulation for 
boundary detection that can be applied, in principle, to the 
identification of any type of boundaries, such as general 
boundaries from low-level static cues, motion boundaries or 
category-specific boundaries (Figures 1, 6, 7). Our method 
can be seen both as a generalization of the early view of 
boundaries as step edges [11], and as a unique closed-form 
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Figure 1 . Detection of occlusion and motion boundaries using the 
proposed generalized boundary detection method (Gb). First two 
rows: the input layers consist of color (C), soft-segmentation (S) 
[the first three dimensions are shown as RGB], and optical 
flow (OF). Last two rows: input layers are color (C), depth (D) 
and optical flow (OF). The same implementation is used for both; 
combining multiple input layers using Gb improves boundary de- 
tection. Best viewed in color. 



solution to current boundary detection problems, based on 
a straightforward mathematical formulation. 

We generalize the classical view of boundaries from 
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sudden signal changes on the original low-level image in- 
put [ ,5,6, 10, 14, 16, 18], to a locally linear (planar or step- 
wise) model on multiple layers of the input. The layers 
are interpretations of the image at different levels of visual 
processing, which could be high-level (e.g., object category 
segmentation) or low-level (e.g., color or grey level inten- 
sity). 

Despite the abundance of research on boundary detec- 
tion, there is no general formulation of this problem. In 
this paper, we make the popular but implicit intuition of 
boundaries explicit: boundary pixels mark the transition 
from one relatively constant region to another, in appro- 
priate interpretations of the image. Thus, while the region 
constancy assumption may only apply weakly for low-level 
input such as pixel intensity, it will also be weakly observed 
in higher-level interpretation layers of the image. General- 
ized boundary detection aims to exploit such weak signals 
across multiple layers in a principled manner. We could say 
that boundaries do not exist in the raw image, but rather 
in the multiple interpretation layers of that image. We can 
summarize our assumptions as follows: 

1. A boundary separates different image regions, which 
in the absence of noise are almost constant, at some 
level of image interpretation or processing. For exam- 
ple, at the lowest level, a region could have a constant 
intensity. At a higher-level, it could be a region delim- 
itating an object category, in which case the output of 
a category-specific classifier would be constant. 

2. For a given image, boundaries in one layer often coin- 
cide, in terms of position and orientation, with bound- 
aries in other layers. For example, discontinuities in 
intensity are typically correlated with discontinuities 
in optical flow, texture or other cues. Moreover, the 
boundaries that align across multiple layers often cor- 
respond to the semantic boundaries that are primar- 
ily of interest to humans: the so-called "ground-truth 
boundaries". 

Based on these observations, we develop a unified model, 
which can simultaneously consider both low-level and 
higher-level information. 

Classical vector-valued techniques on multi-images [6, 
10, 1 1] can be simultaneously applied to several image chan- 
nels, but differ from the proposed approach in a fundamen- 
tal way: they are specifically designed for low-level in- 
put, by using first or second-order derivatives of the image 
channels, with edge models limited to very small neighbor- 
hoods of only a few pixels (for approximating the deriva- 
tives). We argue that in order to correctly incorporate 
higher-level information, one must go beyond a few pix- 
els, to much larger neighborhoods, in line with more recent 
methods [1, 15, 17, 19]. First, even though boundaries from 
one layer coincide with edges from a different layer, they 



cannot be required to match perfectly in location. Second, 
boundaries, especially in higher-level layers, do not have to 
correspond to sudden changes. They could be smooth tran- 
sitions over larger regions and exhibit significant noise that 
would corrupt any local gradient computation. That is why 
we advocate a linear boundary model rather than one based 
on noisy estimation of derivatives, as discussed in the next 
section. 

Another drawback of traditional multi-image techniques 
is the issue of channel scaling, where the algorithms require 
considerable manual tuning. Consistent with current ma- 
chine learning based approaches [1,7, 15], the parameters in 
our proposed method are automatically learned using real- 
world datasets. However, our method has better computa- 
tional complexity and employs far fewer parameters. This 
allows us to learn efficiently from limited quantities of data 
without overfitting. 

Another important advantage of our approach over cur- 
rent methods is in the closed-form computation of the 
boundary orientation. The idea behind Pb [ ] is to clas- 
sify each possible boundary pixel based on the histogram 
difference in color and texture information between the two 
half disks on either side of a potential orientation, for a fixed 
number of candidate angles (e.g., 8). The separate computa- 
tion for each orientation significantly increases the compu- 
tational cost and limits orientation estimates to a particular 
granularity. 

We summarize our contributions as follows: 1) we 
present a closed-form formulation of generalized boundary 
detection that is computationally efficient; 2) we recover ex- 
act boundary normals through direct estimation rather than 
evaluating coarsely sampled orientation candidates; 3) as 
opposed to current approaches [ , ], our unified frame- 
work treats both low-level pixel data and higher-level inter- 
pretations equally and can easily incorporate outputs from 
new image interpretation algorithms; and 4) our method re- 
quires learning only a single parameter per layer, which en- 
ables efficient training with limited data. We demonstrate 
the strength of our method on a variety of real-world tasks. 

2. Problem Formulation 

For a given x Ny image /, let the k-th layer be 
some real-valued array, of the same size, associated with 
/, whose boundaries are relevant to our task. For example, 
Lk could contain, at each pixel, the real-valued output of 
a patch-based binary classifier trained to detect man-made 
structures or respond to a particular texture or color distri- 
bution. ' Thus, Lfc will consist of relatively constant regions 
(modulo classifier error) separated by boundaries. Note that 
the raw pixels in the corresponding regions of the original 
image may not be constant. 



The output of a discrete- valued multi-class classifier can be encoded 
as multiple input layers, with each layer representing a given label. 



Unlike some previous approaches, we expect that bound- 
aries in different layers may not precisely align. Given a 
set of layers, each corresponding to a particular interpreta- 
tion level of the image, we wish to identify the most con- 
sistent boundaries across multiple layers. The output of our 
method for each point p on the x Ny image grid is a 
real-valued probability that p lies on a boundary, given the 
information in all multiple image interpretations Lk cen- 
tered at p. 

We model a boundary point in layer Lk as a transition 
(either sudden or gradual) in the corresponding values of 
Lk along the normal to the boundary. If several K such lay- 
ers are available, let L be a three-dimensional array of size 
X Ny X K, such that L{x,y,k) = Lk{x,y), for each 
k. Thus, L contains all the relevant information for the cur- 
rent boundary detection problem, given multiple interpre- 
tations of the image or video. Figure 1 illustrates how we 
improve the accuracy of boundary detection by combining 
different useful layers of information, such as color, soft- 
segmentation and optical flow, in a single representation L, 

Let Po be the center of a window T4^(po) of size \/Nw x 
^/Nw- For each image-location po we want to evaluate the 
probability of boundary using the information from L, lim- 
ited to that particular window. For any p within the window, 
we make the following approximation, which gives our lo- 
cally linear boundary model: 

ife(p) ~ Cfe(po) + 6fc(po)(Pe - Po)^n(po). (1) 

Here bk is nonnegative and corresponds to the boundary 
"height" for layer k at location po; p^ is the closest point to 
p (projection of p) on the disk of radius e centered at po; 
n(po) is the normal to the boundary and Cfe(po) is a con- 
stant over the window W{po). This constant is useful for 
constructing our model (see Figure 2), but its value is unim- 
portant, since it cancels out, as shown below. Note that if 
we set Ck{po) = Lk{po) and use a sufficiently large e such 
that Pj = p, our model reduces to the first-order Taylor ex- 
pansion of Lk{p) around the current po. 



L(p) = L{p,) 
C = L(po) 



L(p)-C = i)(p£-po) 
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Figure 2. Simplified 1-dimensional view of our generalized bound- 
ary model. £ controls the region where the model is hnear. For 
points outside that region the layer is assumed to be roughly con- 
stant. 



As shown in Figures 2 and 3, e controls the steepness of 
the boundary, going from completely planar when e is large 




Figure 3. Our boundary model for different values of e relative to 
the window size W: a) e > W; h) e = W/2 ; c) e = W/1000. 
When e approaches zero the boundary model becomes a step 
(along the normal direction passing through the window center). 



(first-order Taylor expansion) to a sharp step-wise disconti- 
nuity through the window center po, as e approaches zero. 
More precisely, when e is very small we have a step along 
the normal through the window center, and a sigmoid which 
flattens as we get farther from the center, along the bound- 
ary normal. As e increases, the model flattens to become a 
perfect plane for any e that is larger than the window radius. 

When the window is far from any boundary, the value 
of bk will be near zero, since the only variation in the layer 
values is due to noise. If we are close to a boundary, then bk 
will become positive and large. The term (p^ — po)^n(po) 
approximates the sign which indicates the side of the bound- 
ary: it does not matter on which side we are, as long as a 
sign change occurs when the boundary is crossed. 

When a true boundary is present within several layers at 
the same position — i.e., fefc(po) is non-zero and possibly 
different, for several k — the normal to the boundary should 
be consistent. Thus, we model the boundary normal n as 
common across all layers. 

We can now write the above equation in matrix form for 
all layers, with the same window size and location as fol- 
lows. Let X be a Nw x K matrix with a row i for each 
location pi of the window and a column for each layer k, 
such that Xi-k — Lk{Pi)- Similarly, we define N\y x 2 
position matrix P: on its i-th row we store the x and y 
components of (pe — po) for the i-th point of the win- 



dow. Let n = [jixj^^y] denote the boundary normal and 
b — [fei, 62, ■ • • , bx] the step sizes for layers 1,2, . . . , K. 
Also, let us define the rank-1 2 x K matrix J = n^b. We 
also define matrix C of the same size as X, with each col- 
umn k constant and equal to Ck (po)- 

We can then rewrite Equation 1 as follows (dropping 
the dependency on po for notational simplicity), with un- 
knowns J and C: 



PJ. 



(2) 



Since C is a matrix with constant columns, and each col- 
umn of P sums to 0, we have P-^C = 0. Thus, by multi- 
plying both sides of the equation above by P-'^ we can elim- 
inate the unknown C. Moreover, it can be easily shown that 
P-^P = al, i.e., the identity matrix scaled by a factor a. 



which can be computed since P is known. We finally ob- 
tain a simple expression for the unknown J (since both P 
and X are known): 

J w -P^X. (3) 

a 

Since J = n^b it follows that J — Hbpn^nis sym- 
metric and has rank 1. Then n can be estimated as the 
principal eigenvector of M = JJ^ and ||b|p as its largest 
eigenvalue. ||b||, which is obtained as the square root of the 
largest eigenvalue of M, is the norm of the boundary steps 
vector b = [^'1,^2, •■■i^k]- This norm captures the over- 
all strength of boundaries from all layers simultaneously. If 
layers are properly scaled, then ||b|| could be used as a mea- 
sure of boundary strength. Besides the intuitive meaning of 
||b||, the spectral approach to boundary estimation is also 
related to the gradient of multi-images previously used for 
low-level color edge detection from classical papers such 
as [6, 10] . However, it is important to notice that unlike 
those methods, we do not compute derivatives, as they are 
not appropriate for higher-level layers and can be noisy for 
low-level layers. Instead, we fit a model, which, by con- 
trolling e, can vary from planar to sigmoid/step-wise. For 
smoother-looking results, in practice we weigh the rows of 
matrices X and P by a 2D Gaussian with the mean set to 
the window center po and the standard deviation equal to 
half of the window radius. 

Once we identify ||b||, we pass it through a one- 
dimensional logistic model to obtain the probability of 
boundary, similarly to recent classification approaches to 
boundary detection [ , ]. The parameters of the logis- 
tic regression model are learned using standard procedures. 
The normal to the boundary n is then used for non-maxima 
suppression. 

3. Algorithm and Numerical Considerations 

Before applying the main algorithm we scale each layer 
in L according to its importance, which may be problem de- 
pendent. For example, in Figure 1, it is clear that when re- 
covering occlusion boundaries, the optical flow layer (OF) 
should contribute more than the raw color (C) and color- 
based soft segmentation (S) layers. The images displayed 
are from the dataset of Stein and Hebert [ ]. The optical 
flow shown is an average between the flow { '3} computed 
over two pairs of images: (reference frame, first frame), and 
(reference frame, last frame). We learn the correct scal- 
ing of the layers from training data using a standard un- 
constrained nonlinear optimization procedure (e.g., fmin- 
search routine in MATLAB) on the average F-measure of 
the training set. We apply the same learning procedure in all 
of our experiments. This is computationally feasible since 
there is only one parameter per layer in the proposed model. 

Algorithm 1 (referred to as Gbl) summarizes the pro- 
posed approach. The overall complexity of our method is 



Algorithm 1 Gbl: Fast Generalized Boundary Detection 
Initialize L, scaled appropriately. 
Initialize wq and wi. 
for all pixels p do 

(P^Xp)(P^Xp)^ 
(v, A) ^ principal eigenpair of M 
h i 1 

P l-\-cx.p{'WQ-\-wiy/X) 

dp atan2(wy,Wj;) 
end for 
return b, 6 



relatively straightforward to compute. For each pixel p, 
the most expensive step is the computation of the matrix 
M, which takes 0{{Nw + 2)K) steps (Nw is the num- 
ber of pixels in the window, and K is the number of lay- 
ers). Since M is always 2x2, computing its eigenpair 
(v, A) is a closed-form operation, with a small fixed cost. 
It follows that for a fixed window size Nw and a total of 
N pixels per image the overall complexity of our algorithm 
is 0{KNwN). If Nw is a constant fraction / of N, then 
complexity becomes 0{fKN^). 

Thus, the running time of Gbl compares very favorably 
to that of the Pb algorithm [ ' , 15], which in its exact form 
has complexity 0{fKNoN^), where No is a discrete num- 
ber of candidate orientations. An approximation is pro- 
posed in [ ] with 0{fKNoN}jN) complexity where N^j is 
the number of histogram bins for the different image chan- 
nels. However, NqNi, is large in practice and significantly 
affects the overall running time. 

We also propose a faster version of our algorithm, Gb2, 
with complexity 0{JKN), that is linear in the number of 
image pixels. The speed-up is achieved by computing M 
at a constant cost (independent of the number of pixels in 
the window). When e is large and no Gaussian weighing is 
applied, we have P^Xp = Pp Xp - P JXp, where Pp is 
the matrix of absolute positions for each pixel p and Pq is 
a matrix with two constant columns equal to the 2D coordi- 
nates of the window center Upon closer inspection, we note 
that both PpX and Po^X can be computed in constant 
time by using integral images, for each layer separately. We 
implemented the faster version of our algorithm, Gb2, and 
verified experimentally that it is linear in the number of pix- 
els per image, independent of the window size (Figure 4). 
The output of Gb2 is similar to Gbl (see Table 1), and prov- 
ably identical when e is larger than the window radius and 
no Gaussian weighting is applied. The weighting can be 
approximated by running Gb2 at multiple scales and com- 
bining the results. 

In Figure 4 we present a comparison of the running times 
of edge detection in MATLAB of the three algorithms (Gbl, 
Gb2 and Ph [ ]) vs. the number of pixels per image." 

- Our optimized C++ implementation of Gbl is an order of magnitude 
faster than its MATLAB version. 




formed by a composition of regions of uniform color distri- 
butions, then we can consider c to be a multi-dimensional 
random variable drawn from a mixture (linear combina- 
tion) of color distributions corresponding to the image 
regions: 
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Figure 4. Edge detection running times on a 3.2 GHz desktop 
of our non-optimized MATLAB implementation of Gbl and Gb2 
vs. the publicly available code of Pb [ ]. Each algorithm uses the 
same window radius, whose number of pixels is a constant fraction 
of the total number of image pixels. Gb2 is linear in the number 
of image pixels (independent of the window size). The accuracy 
of all algorithms is similar. 



It is important to note that while our algorithm is fast, 
obtaining some of the layers may be slow, depending on the 
image processing required. If we only use low-level inter- 
pretations, such as raw color or depth (e.g., from an RGB- 
D camera) then the total execution time is small, even for 
a MATLAB implementation. In the next section, we pro- 
pose an efficient method for color-based soft-segmentation 
of images that works well with our algorithm. More com- 
plex, higher-level inputs, such as class-specific segmenta- 
tions naturally increase the total running time. 

4. An Efficient Soft- Segmentation Method 

In this section we present a novel method to rapidly gen- 
erate soft figure/ground image segmentations. Its soft con- 
tinuous output is similar to the eigenvectors computed by 
normalized cuts [ ] or the soft figure/ground assignment 
obtained by alpha-matting [ ], but it is much faster than 
most existing segmentation methods. We describe it here 
because it serves as a fast mid-level interpretation of the 
image that significantly improves accuracy over raw color 
alone. 

While we describe our approach in the context of color 
information, the proposed method is general enough to han- 
dle a variety of other types of low -level information as well. 
The method is motivated by the observation that regions of 
semantic interest (such as objects) can often be modeled 
with a relatively uniform color distribution. Specifically, 
we assume that the colors of any image patch are generated 
from a distribution that is a linear combination (or mixture) 
of a finite number of color probability distributions belong- 
ing to the regions of interest/objects in the image. 

Let c be an indicator vector associated with some patch 
from the image, such that = 1 if color i is present in 
the patch and otherwise. If we assume that the image is 



(4) 



The linear subspace of color distributions can be au- 
tomatically discovered by performing PCA on collections 
of such indicator vectors c, sampled uniformly from the 
image. This idea deserves a further in-depth discussion 
but, due to space limitations, in this paper we outline just 
the main idea, without presenting our detailed probabilistic 
analysis. 

Once the subspace is discovered using PCA, for any 
patch sampled from the image and its associated indicator 
vector c, its generating distribution (considered to be the 
distribution of the foreground) can be reconstructed from 
the linear subspace using the usual PCA reconstruction ap- 
proximation: hF(c) w ho + ^;(c — ho)^Vi. The distri- 
bution of the background is also obtained from the PCA 
model using the same coefficients, but with opposite sign. 
As expected, we obtain a background distribution that is as 
far as possible (in the subspace) from the distribution of the 
foreground: hB(c) « ho - Y^ii^ ~ ho)^Vi. 

Using the figure/ground distributions obtained in this 
manner, we classify each point in the image as either be- 
longing or not to the same region as the current patch. If we 
perform the same classification procedure for Ug (sa 150) 
locations uniformly sampled on the image grid, we obtain 
7T-S figure/ground segmentations for the same image. At 
a final step, we again perform PCA on vectors collected 
from all pixels in the image; each vector is of dimension 
Us and corresponds to a certain image pixel, such that its 
j-th element is equal to the value at that pixel in the j-th 
figure/ground segmentation. Finally we perform PCA re- 
construction using the first 8 principal components, and ob- 
tain a set of 8 soft-segmentations which are a compressed 
version of the entire set of Ug segmentations. These soft- 
segmentations are used as input layers to our boundary de- 
tection method, and are similar in spirit to the normalized 
cuts eigenvectors computed for gPb [ ]. 

In Figure 5 we show examples of the first three such soft- 
segmentations on the RGB color channels. This method 
takes less than 3 seconds in MATLAB on a 3.2GHz desktop 
computer for a 300 x 200 color image. 

5. Experimental analysis 

To evaluate the generality of our proposed method, we 
conduct experiments on detecting boundaries in image, 
video and RGB-D data on both standard and new datasets. 
First, we test our method on static color images for which 




Table 1. Comparisons of accuracy (F-measure) and computational 
time between our method and two other popular methods on BSDS 
dataset. We use two versions of the proposed method: Gbl (S) 
uses color and soft-segmentations as input layers, while Gbl uses 
only color. Color layers are represented in CIE Lab space. 



Algorithm Gbl (S) Gbl Gb2 Pb [ ] Canny [3] 

F-measure 0.67 0.65 0.64 0.65 0.58 
Time (sec) 8 3 2 20 0.1 



Table 2. Performance comparison on the CMU Motion Dataset of 
current techniques for occlusion boundary detection. 



Algorithm 


F-measure 


Gbl 


0.63 


He et al. [9] 


0.47 


Sargin et al. [20] 


0.57 


Stein et al. [22] 


0.48 


Sundberg et al. [24] 


0.62 



we only use the local color information. Second, we per- 
form experiments on occlusion boundary detection in short 
video clips. Multiple frames, closely spaced in time, pro- 
vide significantly more information about dynamic scenes 
and make occlusion boundary detection possible, as shown 
in recent work [9,20,22,24] . Third, we also experiment with 
RGB-D images of people and show that the depth layer can 
be effectively used for detecting occlusions. In the fourth 
set of experiments we use the CPMC method [-] to gener- 
ate figure/ground category segments on the PASCAL2011 
dataset. We show how it can be effectively used to gener- 
ate image layers that can produce high-quality boundaries 
when processed using our method. 

5.1. Boundaries in Static Color Images 

We evaluate our proposed method on the well-known 
BSDS300 benchmark [ ]. We compare the accuracy and 
computational time of Gb with Pb [15] and Canny [ ] edge 
detector. All algorithms use only local information at a 
single scale. Canny uses brightness information, Gb uses 
brightness and color, while Pb uses brightness, color and 
texture information. Table 1 summarizes the results. Note 



that our method is much faster than Pb (times are aver- 
ages in Matlab on the same 3.2 GHz desktop computer). 
When no texture information is used for Pb, its accuracy 
drops significantly while the computational time remains 
high (« 16 seconds). 

5.2. Occlusion Boundaries in Video 

Occlusion boundary detection is an important problem 
and has received increasing attention in computer vision. 
Current state-of-the-art techniques are based on the com- 
putation of optical flow combined with a global processing 
phase [9,20,22,24]. We evaluate our approach on the CMU 
Motion Dataset [''''] and compare our method with pub- 
lished results on the same dataset (summarized in Table 2). 
Optical flow is an important cue for detecting occlusions in 
video; we use Sun et al.'s publicly available code ["•-]. In 
addition to optical flow, we provided Gb-1 with two addi- 
tional layers: color and our soft segmentation (Section 4). 
In contrast to the other methods [9, 20, 22, '], which re- 
quire significant time for processing and optimization, Gb 
requires less than 4 seconds on average (aside from the ex- 
ternal optical flow routine) to process images (230 x 320) 



Table 3. Average F-measure on 100 test RGB-D frames of Gbl 
algorithm, using different layers: color (C), depth (D) and optical 
flow (OF). The performance improves as more layers are com- 
bined. Note: the reported time for C+OF and C+D+OF does not 
include that of generating optical flow using an external module. 



Layers 


Ch-OF 


C+D 


Ch-Dh-OF 


F-measure 


0.41 


0.58 


0.61 


Time (sec) 


5 


4 


6 



from the CMU dataset. 

5.3. Occlusion Boundaries in RGB-D Video 

The third set of experiments uses RGB-D video cHps 
of people performing different actions. We combine the 
low-level color and depth input with large-displacement op- 
tical flow [ ], which is useful for large inter-frame body 
movements. Figure 1 shows an example of the input lay- 
ers and the output of our method. The depth layer was 
pre-processed to retain the largest connected component of 
pixels at a similar depth, so as to cover the main subject per- 
forming actions. Table 3 summarizes boundary detection in 
RGB-D on our dataset of 74 training and 100 testing im- 
ages. We see that Gb can effectively combine information 
from color (C), optical flow (OF) and depth (D) layers to 
achieve better results. Figure 6) shows sample qualitative 
results for Gb using only the basic color and depth informa- 
tion (without pre-processing of the depth layer). Without 
optical flow, the total computation time for boundary detec- 
tion is less than 4 seconds per image in MATLAB. 

5.4. Boundaries from soft-segmentations 

Our previous experiments use our soft-segmentation 
method as one of the input layers for Gb. In all of our exper- 
iments, we find the mid-level layer information provided by 
soft-segmentations significantly improves the accuracy of 
Gb. 

The PCA reconstruction procedure described in Sec- 
tion 4 can also be applied to a large pool of fig- 
ure/ground segments, such as those generated by the CPMC 
method [ ]. This enables us to achieve an F-measure of 0.70 
on BSDS300, which matches the performance of gPb [1]. 
CPMC-nGb also gives very promising results on the PAS- 
CAL2011 dataset, as evidenced by the examples in Fig- 
ure 7. These preliminary results indicate that fusing ev- 
idence from color and soft-segmentation using Gb is a 
promising avenue for further research. 

6. Conclusions 

We present Gb, a novel model and algorithm for general- 
ized boundary detection. Our method effectively combines 

^ We will release this dataset to enable direct comparisons. 



Color (C) Soft Segmentation (S) Gbl (C+S) 




Figure 7. Qualitative results using Gb on PASCAL2011 images, 
from color and soft-segmentations obtained from the output of 
CPMC [4]. Best viewed on the screen. 



multiple low- and high-level interpretation layers of an in- 
put image in a principled manner to achieve state-of-the- 
art accuracy on standard datasets at a significantly lower 
computational cost than competing methods. Gb's broad 
real-world applicability is demonstrated through qualitative 
and quantitative results on detecting semantic boundaries 
in natural images, occlusion boundaries in video and ob- 
ject boundaries in RGB-D data. We also propose a second, 
even more efficient variant of Gb, with asymptotic compu- 
tational complexity that is linear with image size. Addition- 
ally, we introduce a practical method for fast generation of 
soft-segmentations, using either PCA dimensionality reduc- 
tion on data collected from image patches or a large pool of 
figure/ground segments. We also demonstrate experimen- 
tally that our soft-segmentations are valuable mid-level in- 
terpretations for boundary detection. 
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